Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks

نویسندگان

  • Lei Huang
  • Xianglong Liu
  • Bo Lang
  • Adams Wei Yu
  • Bo Li
چکیده

Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feedforward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the rectangular orthogonal matrix can stabilize the distribution of network activations and regularize FNNs. We also propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal and back-propagates gradient information through the transformation during training. To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design an orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method largely improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets. In particular, we have reduced the test error of wide residual network on CIFAR-100 from 20.04% to 18.61% with such simple substitution. Our code is available online for result reproduction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized BackPropagation, Étude De Cas: Orthogonality

This paper introduces an extension of the backpropagation algorithm that enables us to have layers with constrained weights in a deep network. In particular, we make use of the Riemannian geometry and optimization techniques on matrix manifolds to step outside of normal practice in training deep networks, equipping the network with structures such as orthogonality or positive definiteness. Base...

متن کامل

A Neural Stiefel Learning based on Geodesics Revisited

In this paper we present an unsupervised learning algorithm of neural networks with p inputs and m outputs whose weight vectors have orthonormal constraints. In this setting the learning algorithm can be regarded as optimization posed on the Stiefel manifold, and we generalize the natural gradient method to this case based on geodesics. By exploiting its geometric property as a quotient space: ...

متن کامل

Multiple point of self-transverse immesions of certain manifolds

In this paper we will determine the multiple point manifolds of certain self-transverse immersions in Euclidean spaces. Following the triple points, these immersions have a double point self-intersection set which is the image of an immersion of a smooth 5-dimensional manifold, cobordant to Dold manifold $V^5$ or a boundary. We will show there is an immersion of $S^7times P^2$ in $mathbb{R}^{1...

متن کامل

Projection Based Weight Normalization for Deep Neural Networks

Optimizing deep neural networks (DNNs) often suffers from the ill-conditioned problem. We observe that the scaling-based weight space symmetry property in rectified nonlinear network will cause this negative effect. Therefore, we propose to constrain the incoming weights of each neuron to be unit-norm, which is formulated as an optimization problem over Oblique manifold. A simple yet efficient ...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.06079  شماره 

صفحات  -

تاریخ انتشار 2017